Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Biotechnol ; 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38653796

RESUMO

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network and a protein language model. Focusing on two enzyme families, we expressed and purified over 500 natural and generated sequences with 70-90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved the rate of experimental success by 50-150%. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants for experimental testing.

2.
Cell Syst ; 15(3): 286-294.e2, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38428432

RESUMO

Pretrained protein sequence language models have been shown to improve the performance of many prediction tasks and are now routinely integrated into bioinformatics tools. However, these models largely rely on the transformer architecture, which scales quadratically with sequence length in both run-time and memory. Therefore, state-of-the-art models have limitations on sequence length. To address this limitation, we investigated whether convolutional neural network (CNN) architectures, which scale linearly with sequence length, could be as effective as transformers in protein language models. With masked language model pretraining, CNNs are competitive with, and occasionally superior to, transformers across downstream applications while maintaining strong performance on sequences longer than those allowed in the current state-of-the-art transformer models. Our work suggests that computational efficiency can be improved without sacrificing performance, simply by using a CNN architecture instead of a transformer, and emphasizes the importance of disentangling pretraining task and model architecture. A record of this paper's transparent peer review process is included in the supplemental information.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Sequência de Aminoácidos , Revisão por Pares
3.
Nat Commun ; 15(1): 1059, 2024 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-38316764

RESUMO

The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.


Assuntos
Dobramento de Proteína , Proteínas , Proteínas/metabolismo , Redes Neurais de Computação , Conformação Proteica
4.
Protein Eng Des Sel ; 362023 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-37883472

RESUMO

Self-supervised pretraining on protein sequences has led to state-of-the art performance on protein function and fitness prediction. However, sequence-only methods ignore the rich information contained in experimental and predicted protein structures. Meanwhile, inverse folding methods reconstruct a protein's amino-acid sequence given its structure, but do not take advantage of sequences that do not have known structures. In this study, we train a masked inverse folding protein masked language model parameterized as a structured graph neural network. During pretraining, this model learns to reconstruct corrupted sequences conditioned on the backbone structure. We then show that using the outputs from a pretrained sequence-only protein masked language model as input to the inverse folding model further improves pretraining perplexity. We evaluate both of these models on downstream protein engineering tasks and analyze the effect of using information from experimental or predicted structures on performance.


Assuntos
Engenharia de Proteínas , Dobramento de Proteína , Sequência de Aminoácidos
5.
ArXiv ; 2023 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-37292483

RESUMO

Directed evolution of proteins has been the most effective method for protein engineering. However, a new paradigm is emerging, fusing the library generation and screening approaches of traditional directed evolution with computation through the training of machine learning models on protein sequence fitness data. This chapter highlights successful applications of machine learning to protein engineering and directed evolution, organized by the improvements that have been made with respect to each step of the directed evolution cycle. Additionally, we provide an outlook for the future based on the current direction of the field, namely in the development of calibrated models and in incorporating other modalities, such as protein structure.

7.
PLoS Comput Biol ; 19(5): e1011162, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37220151

RESUMO

Natural products are chemical compounds that form the basis of many therapeutics used in the pharmaceutical industry. In microbes, natural products are synthesized by groups of colocalized genes called biosynthetic gene clusters (BGCs). With advances in high-throughput sequencing, there has been an increase of complete microbial isolate genomes and metagenomes, from which a vast number of BGCs are undiscovered. Here, we introduce a self-supervised learning approach designed to identify and characterize BGCs from such data. To do this, we represent BGCs as chains of functional protein domains and train a masked language model on these domains. We assess the ability of our approach to detect BGCs and characterize BGC properties in bacterial genomes. We also demonstrate that our model can learn meaningful representations of BGCs and their constituent domains, detect BGCs in microbial genomes, and predict BGC product classes. These results highlight self-supervised neural networks as a promising framework for improving BGC prediction and classification.


Assuntos
Produtos Biológicos , Genoma Bacteriano , Metagenoma , Família Multigênica/genética , Produtos Biológicos/metabolismo , Aprendizado de Máquina Supervisionado
8.
Nat Comput Sci ; 3(5): 366-367, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-38177841
9.
PLoS Comput Biol ; 18(5): e1010045, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35500014

RESUMO

Identifying structural differences among proteins can be a non-trivial task. When contrasting ensembles of protein structures obtained from molecular dynamics simulations, biologically-relevant features can be easily overshadowed by spurious fluctuations. Here, we present SINATRA Pro, a computational pipeline designed to robustly identify topological differences between two sets of protein structures. Algorithmically, SINATRA Pro works by first taking in the 3D atomic coordinates for each protein snapshot and summarizing them according to their underlying topology. Statistically significant topological features are then projected back onto a user-selected representative protein structure, thus facilitating the visual identification of biophysical signatures of different protein ensembles. We assess the ability of SINATRA Pro to detect minute conformational changes in five independent protein systems of varying complexities. In all test cases, SINATRA Pro identifies known structural features that have been validated by previous experimental and computational studies, as well as novel features that are also likely to be biologically-relevant according to the literature. These results highlight SINATRA Pro as a promising method for facilitating the non-trivial task of pattern recognition in trajectories resulting from molecular dynamics simulations, with substantially increased resolution.


Assuntos
Ciência de Dados , Simulação de Dinâmica Molecular , Biofísica , Conformação Proteica , Proteínas/química
10.
Cell Syst ; 13(4): 274-285.e6, 2022 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-35120643

RESUMO

The degree to which evolution is predictable is a fundamental question in biology. Previous attempts to predict the evolution of protein sequences have been limited to specific proteins and to small changes, such as single-residue mutations. Here, we demonstrate that by using a protein language model to predict the local evolution within protein families, we recover a dynamic "vector field" of protein evolution that we call evolutionary velocity (evo-velocity). Evo-velocity generalizes to evolution over vastly different timescales, from viral proteins evolving over years to eukaryotic proteins evolving over geologic eons, and can predict the evolutionary dynamics of proteins that were not used to develop the original model. Evo-velocity also yields new evolutionary insights by predicting strategies of viral-host immune escape, resolving conflicting theories on the evolution of serpins, and revealing a key role of horizontal gene transfer in the evolution of eukaryotic glycolysis.


Assuntos
Evolução Molecular , Idioma , Sequência de Aminoácidos , Mutação/genética , Proteínas/genética
11.
PLoS Comput Biol ; 18(2): e1009853, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35143485

RESUMO

Biocatalysis is a promising approach to sustainably synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale. However, the adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates. While machine learning and in silico directed evolution are well-posed for this predictive modeling challenge, efforts to date have primarily aimed to increase activity against a single known substrate, rather than to identify enzymes capable of acting on new substrates of interest. To address this need, we curate 6 different high-quality enzyme family screens from the literature that each measure multiple enzymes against multiple substrates. We compare machine learning-based compound-protein interaction (CPI) modeling approaches from the literature used for predicting drug-target interactions. Surprisingly, comparing these interaction-based models against collections of independent (single task) enzyme-only or substrate-only models reveals that current CPI approaches are incapable of learning interactions between compounds and proteins in the current family level data regime. We further validate this observation by demonstrating that our no-interaction baseline can outperform CPI-based models from the literature used to guide the discovery of kinase inhibitors. Given the high performance of non-interaction based models, we introduce a new structure-based strategy for pooling residue representations across a protein sequence. Altogether, this work motivates a principled path forward in order to build and evaluate meaningful predictive models for biocatalysis and other drug discovery applications.


Assuntos
Aprendizado de Máquina , Proteínas , Sequência de Aminoácidos , Descoberta de Drogas , Proteínas/química , Especificidade por Substrato
12.
J Endourol ; 36(2): 203-208, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34663087

RESUMO

Objectives: To demonstrate feasibility of robot-assisted laparoscopic (RAL) ureteroureterostomy (UU) for benign distal ureteral strictures (DUS) in our robotic reconstruction series with long-term follow-up. Patients and Methods: In a retrospective review of our prospectively maintained RAL ureteral reconstruction database, we followed patients between June 2012 and February 2019 who underwent a UU for DUS. In addition to patient demographics, we recorded the etiology, stricture length, and recurrence rates. Recurrence was defined as findings of recurrent or persistent obstruction by postoperative mercaptoacetyltriglycine diuretic renal scan or the need for additional intervention with ureteral drainage or revisional surgery. Results: We identified 22 patients who underwent a RAL-UU for DUS of benign etiologies. Median age was 42 years (interquartile range [IQR] 39-57) and 20 of 22 patients (90.1%) were women. Median stricture length was 1.5 cm (IQR 1-2). Iatrogenic surgical injury was noted in 16 patients (73%). All ureteral reconstruction was performed using RAL. Postoperative imaging consisted of renal ultrasonography, diuretic renal scan, or cross-sectional radiology within 3 months of the index operation. Further imaging was dependent on clinical judgment. Twenty patients (90.1%) had success with median follow-up time of 54.6 months with two recurrences necessitating RAL ureteroneocystostomy (UNC). Conclusion: RAL-UU for DUS is technically viable and shows promising efficacy in properly selected patients. This technique may serve a niche for preserving the natural anatomical drainage of the bladder and ureter in addition to obviating the sequela of vesicoureteral reflux as seen in UNC.


Assuntos
Laparoscopia , Procedimentos Cirúrgicos Robóticos , Robótica , Ureter , Obstrução Ureteral , Adulto , Constrição Patológica/complicações , Constrição Patológica/cirurgia , Estudos Transversais , Feminino , Seguimentos , Humanos , Laparoscopia/métodos , Estudos Retrospectivos , Procedimentos Cirúrgicos Robóticos/efeitos adversos , Resultado do Tratamento , Ureter/cirurgia , Obstrução Ureteral/etiologia , Obstrução Ureteral/cirurgia
13.
Curr Opin Struct Biol ; 72: 145-152, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34896756

RESUMO

Machine-learning models that learn from data to predict how protein sequence encodes function are emerging as a useful protein engineering tool. However, when using these models to suggest new protein designs, one must deal with the vast combinatorial complexity of protein sequences. Here, we review how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement. First, we discuss how to select sequences through a single round of machine-learning optimization. Then, we discuss sequential optimization, where the goal is to discover optimized sequences and improve the model across multiple rounds of training, optimization, and experimental measurement.


Assuntos
Aprendizado de Máquina , Engenharia de Proteínas , Sequência de Aminoácidos , Proteínas
14.
Transl Androl Urol ; 10(5): 2171-2177, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-34159099

RESUMO

Since the advent of the robotic surgery, its implementation in urology has been both wide and rapid. Particularly in extirpative surgery for prostate cancer, techniques in robotic-assisted radical prostatectomy have-and continue to-evolve to maximize functional and oncologic outcomes. In this review, we briefly present a historical perspective of the evolution of various robotic techniques, allowing us to contextualize contemporary robotic approaches to radical prostatectomy.

15.
Curr Opin Chem Biol ; 65: 18-27, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34051682

RESUMO

Protein engineering seeks to identify protein sequences with optimized properties. When guided by machine learning, protein sequence generation methods can draw on prior knowledge and experimental efforts to improve this process. In this review, we highlight recent applications of machine learning to generate protein sequences, focusing on the emerging field of deep generative methods.


Assuntos
Aprendizado de Máquina , Engenharia de Proteínas , Sequência de Aminoácidos
16.
Curr Protoc ; 1(5): e113, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33961736

RESUMO

Models from machine learning (ML) or artificial intelligence (AI) increasingly assist in guiding experimental design and decision making in molecular biology and medicine. Recently, Language Models (LMs) have been adapted from Natural Language Processing (NLP) to encode the implicit language written in protein sequences. Protein LMs show enormous potential in generating descriptive representations (embeddings) for proteins from just their sequences, in a fraction of the time with respect to previous approaches, yet with comparable or improved predictive ability. Researchers have trained a variety of protein LMs that are likely to illuminate different angles of the protein language. By leveraging the bio_embeddings pipeline and modules, simple and reproducible workflows can be laid out to generate protein embeddings and rich visualizations. Embeddings can then be leveraged as input features through machine learning libraries to develop methods predicting particular aspects of protein function and structure. Beyond the workflows included here, embeddings have been leveraged as proxies to traditional homology-based inference and even to align similar protein sequences. A wealth of possibilities remain for researchers to harness through the tools provided in the following protocols. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. The following protocols are included in this manuscript: Basic Protocol 1: Generic use of the bio_embeddings pipeline to plot protein sequences and annotations Basic Protocol 2: Generate embeddings from protein sequences using the bio_embeddings pipeline Basic Protocol 3: Overlay sequence annotations onto a protein space visualization Basic Protocol 4: Train a machine learning classifier on protein embeddings Alternate Protocol 1: Generate 3D instead of 2D visualizations Alternate Protocol 2: Visualize protein solubility instead of protein subcellular localization Support Protocol: Join embedding generation and sequence space visualization in a pipeline.


Assuntos
Inteligência Artificial , Aprendizado Profundo , Aprendizado de Máquina , Processamento de Linguagem Natural , Proteínas
17.
ACS Synth Biol ; 9(8): 2154-2161, 2020 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-32649182

RESUMO

Short (15-30 residue) chains of amino acids at the amino termini of expressed proteins known as signal peptides (SPs) specify secretion in living cells. We trained an attention-based neural network, the Transformer model, on data from all available organisms in Swiss-Prot to generate SP sequences. Experimental testing demonstrates that the model-generated SPs are functional: when appended to enzymes expressed in an industrial Bacillus subtilis strain, the SPs lead to secreted activity that is competitive with industrially used SPs. Additionally, the model-generated SPs are diverse in sequence, sharing as little as 58% sequence identity to the closest known native signal peptide and 73% ± 9% on average.


Assuntos
Aprendizado de Máquina , Sinais Direcionadores de Proteínas , Área Sob a Curva , Bacillus subtilis/metabolismo , Proteínas de Bactérias/metabolismo , Bases de Dados de Proteínas , Curva ROC
18.
Investig Clin Urol ; 61(Suppl 1): S23-S32, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32055751

RESUMO

Distal ureteral reconstruction for benign pathologies such as stricture disease or iatrogenic injury has posed a challenge for urologist as endoscopic procedures have poor long-term outcomes, requiring definitive open reconstruction. Over the past decade, there has been an increasing shift towards robot-assisted laparoscopy (RAL) with multiple institutions reporting their outcomes. In this article, we reviewed the current literature on RAL distal ureteral reconstruction, focusing on benign pathologies only. We present peri-operative data and outcomes on the most common technique, ureteral reimplantation, as well as adjunct procedures such as psoas hitch and Boari flap. Additionally, we present alternative techniques reported in the literature with some technical considerations. Lastly, we describe the outcomes of the comparative studies between open, laparoscopy, and RAL. Although the body of literature in this field is limited, RAL reconstruction of the distal ureter appears to be safe, feasible, and with some advantages over the traditional open approach.


Assuntos
Laparoscopia/métodos , Procedimentos Cirúrgicos Robóticos , Ureter/cirurgia , Doenças Ureterais/cirurgia , Humanos , Procedimentos Cirúrgicos Urológicos/métodos
19.
Nat Methods ; 16(11): 1176-1184, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31611694

RESUMO

We engineered light-gated channelrhodopsins (ChRs) whose current strength and light sensitivity enable minimally invasive neuronal circuit interrogation. Current ChR tools applied to the mammalian brain require intracranial surgery for transgene delivery and implantation of fiber-optic cables to produce light-dependent activation of a small volume of tissue. To facilitate expansive optogenetics without the need for invasive implants, our engineering approach leverages the substantial literature of ChR variants to train statistical models for the design of high-performance ChRs. With Gaussian process models trained on a limited experimental set of 102 functionally characterized ChRs, we designed high-photocurrent ChRs with high light sensitivity. Three of these, ChRger1-3, enable optogenetic activation of the nervous system via systemic transgene delivery. ChRger2 enables light-induced neuronal excitation without fiber-optic implantation; that is, this opsin enables transcranial optogenetics.


Assuntos
Channelrhodopsins/genética , Aprendizado de Máquina , Optogenética , Engenharia de Proteínas/métodos , Animais , Channelrhodopsins/fisiologia , Células HEK293 , Humanos , Camundongos , Camundongos Endogâmicos C57BL
20.
Nat Methods ; 16(8): 687-694, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31308553

RESUMO

Protein engineering through machine-learning-guided directed evolution enables the optimization of protein functions. Machine-learning approaches predict how sequence maps to function in a data-driven manner without requiring a detailed model of the underlying physics or biological pathways. Such methods accelerate directed evolution by learning from the properties of characterized variants and using that information to select sequences that are likely to exhibit improved properties. Here we introduce the steps required to build machine-learning sequence-function models and to use those models to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to the use of machine learning for protein engineering, as well as the current literature and applications of this engineering paradigm. We illustrate the process with two case studies. Finally, we look to future opportunities for machine learning to enable the discovery of unknown protein functions and uncover the relationship between protein sequence and function.


Assuntos
Algoritmos , Evolução Molecular Direcionada , Aprendizado de Máquina , Modelos Biológicos , Engenharia de Proteínas/métodos , Proteínas/metabolismo , Humanos , Proteínas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...